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Abstract 



Let Abe an M by N matrix (M < N) which is an instance of a real random Gaussian ensemble. In compressed 
sensing we are interested in finding the sparsest solution to the system of equations Ax = y for a given y. In 

~~ i ■ general, whenever the sparsity of x is smaller than half the dimension of y then with overwhelming probability 
over A the sparsest solution is unique and can be found by an exhaustive search over x with an exponential time 

^3 ' complexity for any y The recent work of Candes, Donoho, and Tao shows that minimization of the l\ norm of 

O . x subject to Ax = y results in the sparsest solution provided the sparsity of x, say K, is smaller than a certain 
threshold for a given number of measurements. Specifically, if the dimension ofj approaches the dimension ofx, 
the sparsity ofx should be K < 0.239N. Here, we consider the case where x is d-block sparse, i.e., x consists of 

_l ■ n = N/d blocks where each block is either a zero vector or a nonzero vector. Instead of £\-norm relaxation, we 
consider the following relaxation 



min ||Xi||2 + HX2II2 + • • • + ||X„||2, subject to Ax = y (•) 



X 



o 
o 

qq ! where X,- — ( x ({-i)d+i/ x (i-i)d+2> ■ ■ ■ > x id) f or i — 1, 2, . . . , N. Our main result is that as n — > oo, (0) finds 
^^ . the sparsest solution to Ax = y with overwhelming probability in A, for any x whose block sparsity is k/n < 

y 2 — O(e), provided M/N > 1 — 1/d, and d = D.(log(l/e)/e). The relaxation given in (0) can be solved in 

polynomial time using semi-definite programming. 



1. Introduction 

Let A be an M by N instance of the real random Gaussian ensemble and x be an N dimensional signal from M N 
with sparsity K, i.e., only K elements of x are nonzero. Set y = Ax which is an M dimensional vector in IR M . 
In compressed sensing y is called the measurement vector and A the Gaussian measurement matrix. Compressed 
sensing has applications in many different fields such as data mining [14], error-correcting codes [12, 16, 18], DNA 
microarrays [13,33,34], astronomy, tomography, digital photography, and A/D converters. 

In general, when K ^C N one can hope that y = Ax is unique for large enough M which is much smaller than 
N. In other words, instead of sensing an N dimensional signal x with sparsity K we can measure M random linear 
functionals of x where MCN and find x by solving the under-determined system of equations y = Ax with the 
extra condition that x is K sparse. The reconstruction can be presented as the following optimization problem: 

min||x||o subject to Ax = y (1) 



where the £q norm or the Hamming norm is the number of nonzero elements of x. 

Define a = M/N and (3 = K/N. In [15], the authors show that if (3 > y 2 a then for any measurement matrix 
A one can construct different K sparse signals xj and X2 such that Ax\ = A\2- In addition, if j3 ^ l / 2 oc then 
there exists an A such that the K sparse solution to y = Ax is unique for any y; specifically, for random Gaussian 
measurements the uniqueness property holds with overwhelming probability in the choice of A. However, the 
reconstruction of x for a given y can be cumbersome. One of the fundamental questions in compressed sensing is 
whether one can efficiently recover x using an optimal number of measurements. 

1.1. Prior work 

A naive exhaustive search can reconstruct the K sparse solution x to the systems of equations y = Ax with 

O ( ( K )M 3 J complexity. Recently, Candes, Romberg, Tao and Donoho [10, 11,30], show that the Iq optimization 

can be relaxed to t\ minimization if the sparsity of the signal is K = 0(M/ log(N/M)). In this case, the sparse 
signal is the solution to the following t\ norm optimization with high probability in the choice of A: 

min ||x||i subject to Ax = y (2) 

X 

This optimization can be solved efficiently using linear programming. Faster algorithms were discovered in 
[1-3, 35]. For a comprehensive list of papers and results in compressed sensing please check [4]. 

Donoho and Tanner [5, 7, 8] determined the region (cc, j3) for which the t\ and £q coincide under Gaussian 
measurements for every (or almost every) X-sparse vector x. From a refinement of their result given in [45], when 
j3 approaches oc the sparsity has to be smaller than 0.239N. Notice that, ideally, one should be able to recover 
the signal if the sparsity is less than y 2 N. We have to mention that with Vandermonde measurements we can 
recover the sparse signal with optimal number of measurements efficiently [15]. However, it is not clear whether 
the resulting algorithms (which are variations of recovering a measure from its moments) are robust with respect 
to imperfections, such as noise [9,27-29]. Also, results similar to those valid for Gaussian matrices A have been 
established for several different ensembles, e.g., Fourier (see e.g., [11]). 

In this paper, we will focus on developing robust efficient algorithms that work for Gaussian measurements. 

1.2. Our main result 

We consider the reconstruction of block-sparse signals from their random measurements. A signal of dimension 
N which consists of n blocks of size d = N/n is k sparse if only k blocks of the signal out of n are nonzero. 
Such signals arise in various applications, e.g., DNA microarrays, equalization of sparse communication channels, 
magnetoencephalography etc. (see e.g., [33,34,36,39-41] and the references therein). We measure the signal with 
a md x nd random Gaussian matrix y = Ax. More on a scenario similar to this one the interested reader can 
find in e.g. [36-38,42,44]. Using the t\ relaxation for reconstructing x does not exploit the fact that the signal is 
block-sparse, i.e. that the nonzero entries occur in consecutive positions. Instead, different techniques were used 
throughout the literature. In [36] the authors adapt standard orthogonal matching pursuit algorithm (used normally 
in case k = 1) to the block-sparse case. In [37,38,42,43] the authors use certain convex or non-convex relaxations 
(mostly different from the standard l\) and discuss their performances. Generalization of the block-sparse prob- 
lem to the case when the number of blocks is infinite was considered in the most recent paper [44]. In this paper 
we consider the following convex relaxation for the recovery of x: 

min ||Xi||2 + HX2II2 + • • • + ||X„||2, subject to Ax = y (3) 



X 



where X, = (xt i _ 1 \ d+1 , X(,_ 1 w +2 , . . . , x^), for i = 1,2, ... ,n. We will analyze its theoretical performance and 
show that for a large enough d, independent of n, if a approaches one, fi can approach y 2 and the optimization of 



(0 will give the unique sparse solution with overwhelming probability over the choice of A for any y. We will 
also briefly outline how (0 can be posed as a semi-definite program and therefore solved efficiently in polynomial 
time by a host of numerical methods. Furthermore, we will demonstrate how (0 can be adapted for practical 
considerations. Numerical results that we will present indicate that in practice a modified version of (0 (given in 
Section |4| will even for moderate values of d be able to recover most of the signals with sparsity fairly close to the 
number of measurements. Before proceeding further we state the main result of this work in the following theorem. 

Theorem 1. Let A be an md x nd matrix. Further, let A be an instance of the random Gaussian ensemble. Assume 
that e is a small positive number, i.e., < e <C 1, d = Q(log(l/e)/e), a > 1 — 1/d, and [3 = l / 2 — O(e). Also, 
assume that n tends to infinity, m = an, and the block-sparsity ofx is smaller than fin. Then, with overwhelming 
probability, any d-block sparse signal x can be reconstructed efficiently from y = Ax by solving the optimization 
problem (fj]). 

Our proof technique does not use the restricted isometry property of the measurement matrix A, introduced 
in the work of Candes and Tao [11] and further discussed in [17], nor does it rely on the A:-neighborliness of the 
projected polytopes presented in the work of Donoho and Tanner [5,7, 8, 19]. Instead, we look at the null-space 
of the measurement matrix A and use a generalization of a necessary and sufficient condition given in [31] for the 
equivalence of £Q) and (0. 

We are able to use some probabilistic arguments to show that, for a random Gaussian measurement matrix, 
(0 given below holds with overwhelming probability. In our proof we use a union bound to upper bound the 
probability that (@]) fails; this makes our bound loose for oc less than one. We expect to get sharp thresholds 
for other values of oc by generalizing the idea of looking at the neighborliness of randomly projected simplices 
presented in [5,7, 8]. However, for relaxation in (0 instead of simplices we have to work with the convex hull B 
of n rf-dimensional spheres. Specifically, one would need to compute the probability that a random //-dimensional 
affme plane that passes through a point on the boundary of B will be inside the tangent cone of that given point. 
Solving this problem seems to be rather difficult. 

2. Null-space characterization 

In this section we introduce a necessary and sufficient condition on the measurement matrix A so that the 
optimizations of £T|) and (0 are equivalent for all fc-block sparse x. (see [24-26, 31] for variations of this result). 
Throughout the paper we set Jf to be the set of all subsets of size k of {1, 2, . . . , n} and by K we mean the 
complement of the set K C Jf with respect to {1, 2, ... , n}, i.e., K = {1, 2, . . . , n}\/C. 

Theorem 2. Assume that A is a dm x dn measurement matrix, y = Ax and x is k-block sparse. Then (0 
coincides with the solution o/fTJ]) if and only if for all nonzero w £ M where Aw = and all K G J(f 

L ||W,-|| 2 < £ ||W;|| 2 (4) 

iEK i<EK, 

where W ; - = (w ( ,_ 1)i+1/ w (i _ 1)d+2 , . . . , Vf M ),for i = 1, 2, . . . , n. 

Proof. The proof goes along the same line as the proofs in [24-26, 31]. The only difference is that each com- 
ponent of the vector is now replaced by the two norm of the subvector. First we prove that if (0) is satisfied then 
the solution of (0 coincides with the solution (0. Let x be the solution of (0 and let x be the solution of (0. 
Further, let X { = (x (! _ 1)d+1/ x {i _ 1)d+2 , ..., x id ), for i = 1,2, . . . , n and X,- = (x (i _ 1)d+1/ x (i _ 1)d+2/ • • • , x id ), for 



i = 1,2, . . . ,n. Set /C to be the support of x, then we can write 

£ 11**112 = £ H^-Xi + Xfllz 

! = 1 1=1 

= £ ||x,--x ! - + x ! -|| 2 + £ ||^-X,- + Xf|| 2 

i e k ieic 

= £ ||Xj-X i + X i || 2 + £ HXi-XiHz 

i G K iGK 

> LIIX/II2- £ ||X l --X i || 2 + £ HXj-Xilb. (5) 

i'=l i G /C « G C 

Since x — x lies in the null-space of A, we have Li G £ I |X; — X,| | 2 < Li g a: I l^< ~~ ^'1 k- Thus, © implies 
LiLi I l^'l I2 > LiLi I |*i| b> which is a contradiction. Therefore, x = x. Now we prove the converse. Assume (O 
does not hold. Then there exists w G R nd , Aw = 0, w = (£), wi G R kd , w 2 G EC"-*)'' such that wi is /c-block 
sparse and Li G £ I \^i\ I2 ^ Li G £ I |W ( '| | 2 , where /C is the support of v/\. Take x = ( w 1 ) and y = Ax. Since w 
is in the null-space of A, y = A( w ). Therefore we have found a signal ( w ) which is not /c-block sparse and 
has smaller norm than the /V-block sparse ( w 1 )- | 

Remark. We need not to check (0]) for all subsets /C; checking the subset with the k largest (in two norm) blocks 
of w is sufficient. However, the form of Theorem|2]will be more convenient for our subsequent analysis. 

Let Z be a basis of the null space of A, so that any dn dimensional vector w in the the null-space of A can be 
represented as Zv where v G M. d ( n ~ m \ For any v G R d ( n_m ) write w = Zv. We split w into blocks of size d, 
W,- = (vr (i-i)d+it w (i-i)d+2/ • • • / w irf)> f° r i = 1,2, . . . ,n. Then, the condition (0]) of Theorem|2]is equivalent to 

£ W, < £ W,-, for any v G R^"-" 7 ) and /C G X, where w = Zv. (6) 

i G /C i G £ 

We denote by I v the event that © happens. In the following we find an upper bound on the probability that I v 
fails as n tends to infinity. We will show that for certain values of a, ft, and d this probability tends to zero. 

Lemma 3. Let A G R dmxdn be a random matrix with i.i.d. J\f(0, 1) entries. Then the following statements hold: 

• The distribution of A is left-rotationally invariant, Pa(A) = P^(AO), 00* = ©*© = I 

• The distribution ofZ, any basis of the null-space of A is right-rotationally invariant. Pz(Z) = Pz(©*Z), 
00* = 0*0 = I 

• It is always possible to choose a basis for the null-space such that Z G R x d \ n ~ m ) has i.i.d. J\f (0,1) 
entries. 

In view of Theorem|2] and LemmaO for any A whose null-space is rotationally invariant the sharp bounds of [6], 
for example, apply (of course, if k = 1). In this paper, we shall analyze the null-space directly. 

3. Probabilistic analysis of the null-space characterization 

Assume Z is an dn x d(n — m) matrix whose components are i.i.d. zero-mean unit-variance Gaussian random 
variables. Define Z, to be the matrix which consists of the {(z — l)d + 1, (i — l)d + 2, . . . , id] rows of Z and 



define Z{j to be the j-th column of Z,. Let a = l-y,0<y<l where y is a constant independent of n. Then 
we will find a d such that (3 — > y 2 and 

lim P(X) = 1. (7) 

n^oo 

Proving © is enough to show that for all random matrix ensembles which have isotropically distributed null- 
space, © with overwhelming probability solves CQ). In order to prove (0 we will actually look at the complement 

of the event I v and we show that 

def 



lim P f = lim P(I V ) = 0, 
where I v denotes the complement of the event I v . Using the union bound we can write 



(8) 




3ve 



n—m) 



■ L Il z ' v ll2 ^ L H z ' v l 



(9) 



iEK 



i<EK, 



Clearly the size of J^ is (T). Since the probability in (© is insensitive to scaling of v by a constant we can restrict 
v to lie on the surface of a shape C that encapsulates the origin. Furthermore, since the elements of the matrix Z 
are i.i.d. all (?) terms in the first summation on the right hand side of (© will then be equal. Therefore we can 
further write 

k n \ 




iZivl 



(10) 



i=*+l 



The main difficulty in computing the probability on the right hand side of (fTOl is in the fact that the vector v is 
continuous. Our approach will be based on the discrete covering of the unit sphere. In order to do that we will use 
small spheres of radius e. It can be shown [18,20,21] that e ~ d ("- m ) spheres of radius e is enough to cover the sur- 
face of the din — m) -dimensional unit sphere. Let the coordinates of the centers of these e - d \ n - m ) small spheres 
be the vectors z t , t = 1,2,. . . ,e~ d{ - n ~ m \ Clearly, ||z f || 2 = Vl - e 2 . Further, let S t ,t = 1,2, . . . , e - d ("-'") 
be the intersection of the unit sphere and the hyperplane through z t perpendicular on the line that connects z t 
and the origin. It is not difficult to see that Uf=i $t forms a body which completely encapsulates the origin. 
This effectively means that for any point v such that | |v| | > 1, the line connecting v and the origin will intersect 



Ut=i St- Hence, we set C 



Ut=i St and apply union bound over St to get 



P/< 



-d(n-m) max 



' k n N 

3veS f : £H Z ' v ll2^ Yj Il z ! v ll2 

, i=\ i=k+l , 



(ID 



Every vector v G St can be represented as v = z t + e where | |e| | 2 ^ e. Then we have 



max 

t 



P [3veS t : £||Z;v|| 2 ^ £ ||Z,-v| 



/=i 



i=k+l 



max 

t 



P 3e : ||e|| 2 ^eand £ ||Z;(z f + e)| | 2 ^ £ ||Z,-(z, + e)|| 2 
V i=l t=k+l y 



(12) 



Given the symmetry of the problem (i.e. the rotaional invariance of the Z,) it should be noted that, without loss of 
generality, we can assume z t = [| \z t \ | 2 , 0, 0, ... , 0]. Further, using the results from [23] we have that ^("-"O- 1 
points can be located on the sphere of radius ce centered at z t such that St (which lies in a (d(n — m) — 1)- 
dimensional space and whose radius is e) is inside a polytope determined by them and 



c € < 



d-H^v^P^ 



Mi+^-V 



if tj < Vl 



otherwise. 



(13) 




Figure 1. Covering of the unit sphere 

To get a feeling for what values r\ and c can take we refer to [22] where it was stated that 3 ( H"- m )- 1 points can 
be located on the sphere of radius v/fe centered at z t such that St is inside a polytope determined by them. 

Let us call the polytope determined by jf{n-m)-l points P t . Let ef , 6 = 1,2,..., rjd{n-m)-l be its ^("-"'M 
corner points. Since | |Z,-Zf 1 1 2 — | |Z,e| | 2 ^ | |Z,(z f + e) 1 12, and St C Pt we have 

k n 

maxP(3e, ||e|| 2 < e s. t. £ ||Z;(z t + e)|| 2 > £ ll z *( z f + e )ll2) 
1 i=i i=k+l 

k n 

^ maxP(3e / (z t + e)eP f s. t. £ ||Z { (z t + e)|| 2 ^ £ (||Z;z t || 2 - ||Z !e || 2 )) 



!=1 



i=k+l 



sC max P max £ ||Zfef|| 2 + £ ||Z;(z f + e ?)|| 2 ]> £ ||ZiZ f || 2 . (14) 

f V S \i=k+l i=l / i=fc+l / 

where the second inequality follows from the property that the maximum of a convex function over a polytope is 
achieved at its corner points and that function inside the max, is convex as it is a sum of convex norms. Connecting 
(fTTT ). (fT2l) . and flU) we obtain 



CD 



k 

/=1 



i=k+l 



P /^^^ maxP K ax (.L l|Z.-e?ll2 + El|Z.-(^ + e?)|| 2 ] ^ IJlZiZtHz) . (15) 

Using the union bound over s we further have 

rnaxP max £ ||Zjef|| 2 + £ ||Z;(z f + e?)|| 2 I > £ ||Z,-Zf|| 2 



J=k+1 



k 
I 

!=1 



t=fc+l 



_d(n-m)-l 



^ max 



ax £ P £ ||Z»ef|| 2 + £||ZK^ + ef')|| 2 U£ ||Z, 



s'=l 



>/=/c+l 



c 

/=1 



;Zf 2 



(16) 



i=Jfc+l 



Given that only the first component of z t is not equal to zero and the symmetry of the problem we can write 

n d(n-m)-l ( ( n k \ n \ 

max £ P £ ll^ef|| 2 + £||Z;(z f + ef)|| 2 U £ ||Z,-z t || 2 

1 s'=l V \«=*+l »=1 / i=fc+l / 

(/ „ d(n-m) k \ n \ 

I II I Z l7 (ef) ; || 2 + £||Z ! (z f + er)|| 2 U £ ||^-|| 2 (|| Zf |U- l(ef)il) 
\i=k+l ;=2 !=1 / i=fc+l / 

(17) 

where (e s t )j denotes j-th components of ef . Let B ; - = Z,(z f + ef ), Q = Z,i(||zf|| 2 — | (ef )i|), and D, = 

Y,t=z m Zij(e s t )j. Clearly, B,-, Q, D, are independent zero-mean Gaussian random vectors of length d. Then we 
can rewrite ( TP71 ) as 

^(«-m)-l // „ fc \ » \ 

m f x I P I l|Z ( ef||2 + £||Z,(z f + ef)|| 2 U £ ||Z ; z t || 2 

1 s '=l V V=*:+l i=\ J i=k+\ J 

^ (^"-^-Vaxpf £ HDiHz + £1^-112^ £ HQllzJ. (18) 

Let B,- Q , and D; denote the p-th components of the vectors B,-, Q, D,-, respectively. Then for any 1 ^ p ^ d 
it holds 

var(B !p ) = ||* + ef'||l = 1 - e 2 + ce\ var(Q p ) = (||z f || 2 - |(ef ) 1 |) 2 , var(D,,) = ||ef' ||| - | (ef' ) x \\ 

Let Gj, F, be independent zero-mean Gaussian random vectors such that such that for any 1 ^ p ^ d 

var(G !F ) = (||z|| 2 -||er|| 2 ) 2 / var(F !p ) = ||ef||i. 

Since var(G ;j ) ^ var(C, ), and var(F, ) ^ var(D, ) we have from (fT8T ) 

jfin-m)-! p( £ | | D ;| | 2 + £ | |B,-| | 2 ^ £ ||Q|| 2 ] 

< ri^-VaxP f 11^112 + £||Bi||2^ £ ||G-|| 2 ). (19) 

f ' s \!=Jc+l 1=1 i=fc+l / 

Since ||ef || 2 does not depend on t, s', the outer maximization can be omitted. Furthermore, ||e| || 2 = ce. Using 
the Chernoff bound we further have 



n d(n-m)-l p £|| B ,|| 2 ^ £ (||Q|| 2 -||F,|| 2 ) 
\t=l i=k+l J 

^ ^d(n-m)-lr-£ e ^\\B 1 \\ 2 \kr Ee -ii\\G 1 \\2\n- k (E e ^\\ F i\\2\n-k_ ^20) 

where \x is a positive constant. Connecting ([T5T)-(l20l) we have 

(n\ 1 /n\ d ( n - m ) nun i ( Ee _ H|Gill2 \" 



After setting k = fin, m = an, and using the fact that (?) « e " H ^' we finally obtain 

lim P f sC lim £," (22) 

n— >oo •* n^oo 

where 

_ (T ,/ e) '(i-"> „ Bi „ fl / e^miicxm, y-* 

and H(/3) = /3 In jS + (1 - j3) ln(l - /3). We now set jU = V2rf - l<5-\/2, <5 < 1. In the appendices we will 
determine Ee^ 21 ^ 5 ^^, Ee v / 2^T5 V / 2||F 1 || 2) and Ee -V2d^TsV2\\ Gl \\ 2 _ 

We now return to the analysis of (l23l) . Replacing the results from (I37T ). (|38T >. and (l44l in (|23T ) we finally have 

W(l-a) 



£ _ in/£)^l_I^((^) 2 +^)^( e rf((^/) 2 +5/))l-^( e d((^) 2 -5g))l-/3 ( 24) 



where we recall that & = \/l — e 2 + c 2 e 2 , / = ce, and g = yl — e 2 — ce. Our goal is to find d such that for 
a — 1 — y, < y ^C 1 and fi = ^ — cr,0 < a ^ ^, i, < 1. That means we need 

ln(£) < (25) 

which implies 

d{\ - a) ln(^) + d«5(j3& + (1 - j3)/ - (1 - j3)g) + d5 2 (l3b 2 + (1 - £)/ 2 + (1 - fog 2 < H(j3). (26) 

Let 

g - / 1 - 2ce 

^o P t - ^37 ~ 2"3^ ( 27 ) 

Combining the previous results the following theorem then can be proved. 

Theorem 4. Assume that the matrix A has an isotropically distributed null-space and that the number of rows of 
the matrix A is dm = adn. Fix constants c and r\ according to U3i and arbitrarily small number e and 5. Let 
b = yl — e 2 + c 2 e 2 , / = ce, and g = yl — e 2 — ce. Choose (3 < $ pt where fiopt = \ ~ 0(e) is given by 
rt27D . For any x that is d-block sparse and has block sparsity k < fin, the solutions to the optimizations ([7]) and 

d3) coincide if 

H(jS)-ln(2) 1 

d > TT^i /, n ^r /-, 7XT and a > 1 - - (28) 

6(l3b + (l-l5)f-(l-fi)g) d 

Proof. Follows from the previous discussion combining ([8]>, (l22l) . (|23T ). (l24l . (|25T ). and (l26l ). | 

Before moving on to the numerical study of the performance of the algorithm (0 we should also mention that 
the theoretical results from [10] and [45] are related to what is often called the strong threshold (the interested 
reader can find more on the definition of the strong threshold in [10]) for sparsity. As we have said earlier, if the 
number of the measurements is M = aN then the strong threshold for sparsity is ideally K = jN. Also, the 
definition of the strong threshold assumes that the reconstructing algorithm (©, ([3]> or any other) succeeds for any 
sparse signal with sparsity below the strong threshold. However, since this can not be numerically verified (we 
simply can not generate all possible k block sparse signals from M. dn ), a weaker notion of the threshold (called 
the weak threshold) is usually considered in numerical experiments (the interested reader can also find more on 
the definition of the weak threshold in [10]). The main feature of the weak threshold definition is that it allows 
failure in reconstruction of a certain small fraction of signals with sparsity below it. However, as expected, the 
ideal performance in the sense of weak threshold assumes that if the number of the measurements is M = aN 
and the sparsity is K = fiN, then ft should approach a. As the numerical experiments in the following sections 
hint increasing the block length d leads to almost ideal performance of the reconstructing technique given in ©. 

8 



Table 1. The theoretical and simulation results for recovery of block-sparse signals with different 
block size. p$ is the strong threshold for t\ optimization and pw is the weak threshold for t\ 
optimization both are found from [5,6]. d represents the block size in various simulations. The data 
are taken from the curves with probability of success more than %95. 





5 = 0.1 


5 = 0.3 


<5 = 0.5 


5 = 0.7 


5 = 0.9 


Ps 


0.049 


0.070 


0.089 


0.111 


0.140 


Pw 


0.188 


0.292 


0.385 


0.501 


0.677 


d = \ 


0.10 


0.23 


0.30 


0.41 


0.62 


d = A 


0.30 


0.33 


0.50 


0.57 


0.72 


d = 8 


0.50 


0.60 


0.60 


0.71 


0.89 


d = 16 


0.70 


0.80 


0.80 


0.91 


0.94 



4. Numerical study of the block sparse reconstruction 

In this section we recall the basics of the algorithm, show how it can efficiently be solved in polynomial time, 
and demonstrate its performance through numerical simulations. 

In order to recover a k block sparse signal x from the linear measurements y = Ax we consider the following 
optimization problem 



V i V 

| A l||2 + ||-*2||2 

subject to Ax = y 



min 

X 



IX 



(i 2 



(29) 



where X, = (xn_ 1 w+i/ X(j-i)d+2/ • • • / x id)> for i = 1,1, ... , n. Since the objective function is convex this is 
clearly a convex optimization problem. In principle this problem is solvable in polynomial time. Furthermore, we 
can transform it to a bit more convenient form in the following way 



mm 

X,tl,t 2 ,:;t„ 



!=1 

subject to ||Xi||2 < t], ti^O, l^i^n 
Ax = v 



(30) 



where as earlier X,- = (x/,_ 1 y +1/ X(,_ 1 ^ +2 / • • • / x z'd)> f° r i = 1,2, . . . ,n. Finally, it is not that difficult to see that 
(l30l can be transformed to 



mm 

X,t 1 ,t 2 ,:;t„ 



subject to 







\ki 


x*~ 


[Xi 


ti_ 



Ax 



^0, tj^O, l^i^n 



(31) 



with X,- = (x(,_ 1 ) d+1/ X( ! _ 1 ) d+2 , . . . , x^), for i = 1,2, . . . ,n. Clearly, (I3TI ) is a semi-definite program and can be 
solved by a host of numerical methods in polynomial time. 

To further improve the reconstruction performance we introduce an additional modification of (|3~TT) . Assume 
that X,v 1 ^ i ^ n is the solution of (|3TT) . Further, sort | |X,-| (2 and assume that /C is the set of k indices which 
correspond to the k vectors X, with the largest norm. Let these indices determine the positions of the nonzero 



Algorithm 1 Recovery of block-sparse signals 



Input: Measured vector y G M. m , size of blocks d, and measurement matrix A. 
Output: Block-sparse signal xGf". 
1: Solve the following optimization problem 



min IIX1II2 + IIX2II2 H h 1 1 >C„ 1 1 2 

X 

subject to Ax = y 

using semi-definite programming. 
2: Sort ||X f || 2 fori = l,2,...,n, such that \\X h \\ 2 ^ ||X /2 || 2 >■■-> ||X/J 2 . 
3: The indices j\, ]i, ...,jd mark the blocks of x that are nonzero. Set A to be the submatrix of A containing 

columns of A that are correspond to blocks j\, ]2, ■ ■ ■ , jd- 
4: Let x represent the corresponding nonzero blocks of x determined by j\, ]i, . . . , jd- Set x = A _1 y and the 

rest of blocks of x to zero. 
5: return x. 

blocks. Then let A^ be the submatrix of A obtained by selecting the columns with the indices /C from the first k 
rows of A. Also let y^ be the first kd components of y. Then we generate the nonzero part of the reconstructed 
signal x as x^ = A^y^. We refer to this procedure of reconstructing the sparse signal x as ^2/^1 algorithm and 
in the following subsection we show its performance. 

4.1. Simulation results 

In this section we discuss the performance of the £2/^1 algorithm. We conducted 4 numerical experiments for 
4 different values of the block length d. In cases when d = 1, 4, or 8 we set the length of the sparse vector to be 
N = 800 and in the case d = 16 we set N = 1600. For fixed values of d and N we then generated a random 
Gaussian measurement matrix A for 0.1 ^ a ^ 0.9. For each of these matrices we randomly generate 100 
different signals of a given sparsity (3, form a measurement vector y, and run the £2/^1 algorithm. The percentage 
of success (perfect recovery of the sparse signal) is shown in Figure [2] and Table 1. The case d = 1 corresponds 
to the basic t\ relaxation. As can be seen from Figure [2] increasing the block length significantly improves the 
threshold for allowable sparsity. 

5. Conclusion 

In this paper we studied the efficient recovery of block sparse signals using an under-determined system of 
equations generated from random Gaussian matrices. Such problems arise in different applications, such as DNA 
microarrays, equalization of sparse communication channels, magnetoencephalography, etc. We analyzed the 
minimization of a mixed ^2/^1 typ e norm, which can be reduced to solving a semi-definite program. We showed 
that, as the number of measurements approaches the number of unknowns, the £2/^1 algorithm can uniquely 
recover any block-sparse signal whose sparsity is up to half the number of measurements with overwhelming 
probability over the measurement matrix. This coincides with the best that can be achieved via exhaustive search. 
Our proof technique (which involves a certain union bound) appears to give a loose bound when the number of 
measurements is a fixed fraction of the number of unknowns. For future work it would be interesting to see if one 
could obtain "sharp" bounds on when signal recovery is possible (similar to the sharp bounds in [8]) for £2/^1 
method. 
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Figure 2. Threshold for (3 for a given a (the colors of the curves indicate the probability of success 
of £2/^1 algorithm calculated over 100 independent instances of d-block sparse signals x and a fixed 
Gaussian measurement matrix A) 



References 

[1] G. Cormode and S. Muthukrishnan. Combinatroial algorithms for compressed senisng. SIROCCO, 2006. 

[2] A. Gilbert, M. Strauss, J. Tropp, and R. Vershynin. Algorithmic linear dimension reduction in the 11 norm for 
sparse vectors. 44th Annual Allerton Conference on Communication, Control, and Computing, 2006. 

[3] A. Gilbert, M. Strauss, J. Tropp, and R. Vershynin. One sketch for all: fast algorithms for compresses sensing. 
Proceedings of the Sympoisum on Theory of Computing. 2007. 

[4] Rice DSP Group. Compresses sensing resources. Available at http: // www. dsp . ece . rice . edu/cs 

[5] J. Tanner and D. Donoho. Thresholds for the Recovery of Sparse Solutions via LI Minimization. Proceedings 
of the Conference on Information Sciences and Systems, March 2006. 

[6] David Donoho and Jared Tanner. Neighborliness of randomly -projected simplices in high dimensions. Proc. 
National Academy of Sciences, 102(27), pp. 9452-9457, 2005. 



11 



[7] J. Tanner and D. Donoho. Sparse Nonnegative Solutions of Underdetermined Linear Equations by Linear 
Programming. Proceedings of the National Academy of Sciences, 102(27), pp. 9446-9451, 2005. 

[8] D. L. Donoho. High-dimensional centrally-symmetric polytopes with neighborliness proportional to dimen- 
sion. Disc. Comput Geometry (online), December 2005. 

[9] R. DeVore. Optimal computation. Proceedings of the International congress on Mathematics, 2006. 

[10] D. Donoho. Compressed sensing. IEEE Transactions on Information Theory, 52(4), pp. 1289-1306, 2006. 

[11] E. Candes, J.Romberg, and T. Tao. Robust uncertainty princples: Exact signal reconstruction from highly 
incomplete frequency information. IEEE Transactions on Information Theory, 52, pp. 489-509, 2006. 

[12] E. J. Candes, M. Rudelson, T. Tao and R. Vershynin. Error correction via linear programming. Proceedings 
of the 46th Annual IEEE Symposium on Foundations of Computer Science (FOCS), 295-308. 

[13] O. Milenkovic, R. Baraniuk, and T. Simunic-Rosing. Compressed sensing meets bionformatics: a new DNA 
microarray architecture. Information Theory and Applications Workshop, San Diego, 2007. 

[14] P. Indyk. Explicit Constructions for Compressed Sensing of Sparse Signals, manuscript, July 2007. 

[15] M. Akccakaya and V. Tarokh. Performance Bounds on Sparse Representations Using Redundant Frames. 
larXiv:c s/0703045l 

[16] Emmanuel Candes and Terence Tao. Decoding by linear programming. IEEE Trans, on Information Theory, 
51(12), pp. 4203 - 4215, December 2005. 

[17] Richard Baraniuk, Mark Davenport, Ronald DeVore, and Michael Wakin. A simple proof of the restricted 
isometry property for random matrices. To appear in Constructive Approximation, available online at 

|http: //www, dsp. ece . rice . edu/cs| 

[18] M. Rudelson and R. Vershynin. Geometric approach to error correcting codes and reconstruction of signals. 
International Mathematical Research Notices, 64, pp. 4019 - 4041, 2005. 

[19] A.M. Vershik and P.V. Sporyshev. Asymptotic Behavior of the Number of Faces of Random Polyhedra and 
the Neighborliness Problem. Selecta Mathematica Sovietica, vol. 11, No. 2 (1992). 

[20] A.D. Wyner. Random packings and coverings of the unit N-sphere. Bell systems technical journal, vol. 46, 
2111-2118, 1967. 

[21] I. Dumer, M.S. Pinsker, and V.V. Prelov. On the thinest coverings of the spheres and ellipsoids. Information 
Transfer and Combinatorics, LNCS 4123, pp. 883-910, 2006. 

[22] I. Barany and N. Simanyi. A note on the size of the largest ball inside a convex polytope. Periodica Mathe- 
matica Hungarica, vol. 51 (2), 2005, pp. 15-18. 

[23] K. Boroczky, Jr. and G. Wintsche. Covering the sphere by equal spherical balls, in Discrete and computational 
geometry: the Goodman-Pollach Festschrift (ed. by S. Basu et ah), 2003, 237-253. 

[24] A. Feuer and A. Nemirovski. On sparse representation in pairs of bases. IEEE Transactions on Information 
Theory, 49(6): 1579-1581 (2003). 

[25] N. Linial and I. Novik. How neighborly can a centrally symmetric polytope be? Discrete and Computational 
Geometry, 36(2006) 273-281. 

12 



[26] Y.Zhang. When is missing data recoverable. available online at, 

|http: //www, dsp. ece.rice. edu/cs| . 

[27] J. Haupt and R. Nowak, Signal reconstruction from noisy random projections. IEEE Trans, on Information 
Theory, 52(9), pp. 4036-4048, September 2006. 

[28] M. J. Wainwright. Sharp thresholds for high-dimensional and noisy recovery of sparsity. Proc. Allerton Con- 
ference on Communication, Control, and Computing, Monticello, IL, September 2006. 

[29] J. Tropp, M. Wakin, M. Duarte, D. Baron, and R. Baraniuk. Random filters for compressive sampling and 
reconstruction. Proc. IEEE Int. Conf. on Acoustics, Speech, and Signal Processing (ICASSP), Toulouse, 
France, May 2006. 

[30] Emmanuel Candes. Compressive sampling. Proc. International Congress of Mathematics, 3, pp. 1433-1452, 
Madrid, Spain, 2006. 

[31] M. Stojnic, W. Xu, and B. Hassibi, Compressed sensing - Probabilistic analysis of a null space characteriza- 
tion, to be presented at ICASSP, 2008. 

[32] N. Temme. Numerical and asymptotic aspects of parabolic cylinder functions. Modeling, analysis and simu- 
lation, MAS-R0015, 2000. 

[33] S.Erickson, and C. Sabatti. Empirical Bayes estimation of a sparse vector of gene expression. Statistical 
Applications in Genetics and Molecular Biology, 2005. 

[34] H. Vikalo, F. Parvaresh, S. Misra and B. Hassibi. Recovering sparse signals using sparse measurement matri- 
ces in compressed DNA microarrays. Asilomor conference, November 2007. 

[35] W. Xu and B. Hassibi. Efficient compressive sensing with deterministic gaurantees using expander graphs. 
Proceedings of the IEEE Infromation Theory Workshop, Lake Taho, September 2007. 

[36] J. Tropp, A. C. Gilbert, and M. Strauss. Algorithms for simultaneous sparse approximation. Part I: Greedy 
pursuit. Signal Processing, Aug. 2005. 

[37] J. Tropp. Algorithms for simultaneous sparse approximation. Part II: Convex relaxation. Signal Processing, 
Aug. 2005. 

[38] J. Chen and X. Huo. Theoretical results on sparse representations of multiple-measurement vectors. IEEE 
Trans, on Signal Processing, Dec. 2006. 

[39] S. Cotter and B. Rao. Sparse channel estimation via matching pursuit with application to equalization. IEEE 
Trans, on Comm, Mar. 2002. 

[40] I.F. Gorodnitsky, J.S. George, and B. Rao. Neuromagnetic source imaging with focuss: a recursive weighted 
minimum norm algorithm. J. Electroencephalogr. Clin. Neurophysiol, Oct. 1995. 

[41] D. Malioutov, M. Cetin, and A. Willsky. Source localization by enforcing sparsity through Laplacian prior: 
an SVD based approach. IEEE Trans, on Signal Porcessing, Oct. 2003. 

[42] S. Cotter, B. Rao, K. Engan, and K. Kreutz-Delgado. Sparse solutions to linear inverse problems with 
multiple measurement vectors. IEEE Trans, on Signal Porcessing, July 2005. 

[43] I. Gorodnitsky and B.Rao. Sparse signal reconstruction from limited daat using FOCUSS: A re-weighted 
minimum norm algorithm, it IEEE Trans, on Signal Porcessing, Mar. 1997. 

13 



[44] M. Mishali and Y. Eldar. Reduce and Boost: Recovering arbitrary sets of jointly sparse vectors, available 
online at, |http: //www, dsp. ece.rice. edu/cs| 

[45] C. Dwork, F. McSherry and K. Talwar. The price of privacy and the limits of LP decoding. STOC '07: 
Proceedings of the thirty-ninth annual ACM symposium on Theory of computing, New York, NY, 2007. 



A. Computing E 



,V'2d=W2||Bi||2 



and E 



, y /2d=16V2\\F 1 \\ 2 



Now we turn to computing E e V2i-l*v^||Bi|| 2 and Ee V2d-isV2\\F 1 \\ 2 _ Let us fj rst con sider Ee^ B ^ z . Since B x is 
a d dimensional vector let B\ = \B\ lt B\ 2 , . . . , £>ij. As we have stated earlier B\ , 1 ^ p ^ d are i.i.d. zero-mean 
Gaussian random variables with variance var(£>i ) = 1 — e 2 + c 2 e 2 = b 2 , 1 ^ p ^ d. Then we can write 



1 POO /*oo / 

Ee V2d^6V2\\ Bl \\ 2 = _J_ /■ / / ^u^ldVlb 

+ /2.7T J — oo i-oo I 



N 



IX 



^=i B ?, 



dBi. 



P =i 



Using the spherical coordinates it is not that difficult to show that the previous integral can be transformed to 



£ e V^^T<5v/2||Bi|| 2 



! 2 V^ [°° v d-l V2d^lsV2br-4- 



2n T(f) 7o 



r e v 



2dr 



/"°° r d-l e V2d=16V2br-£ dr 

Jo 



(2d-l)(6b) 2 (2d-l)(bb) 2 

v(d)e 2 e 2 

r(|)2l- 1 r(d) 

(2rf— l)(5b) 2 

n|_i_ u( ^_i,_^r,^) 



(32) 



where IT is parabolic cylinder function (see e.g., [32]). Before proceeding further we recall the asymptotic results 
for U from [32]. Namely, from [32] we have that if C S> and t ^ 



2 r(^±i)(f 2 + i)i 



(33) 



where 



h(C) =l-^-\<T^&-\, p = -(ty/i + t 2 + ln(f + y/l + t 1 )). 



(34) 



; From (1331 and (1341) we have 



U(^A-v / 27^IW2) « -L ( (2rl-^v/2J l^"-2- 



, v e ^d(K56Vl+W 5 +ln(»+V / l+W))) 



Connecting (1321 and 051 we finally obtain for d 3> and 5 <C 1 (<5 is a constant independent of d) 



(35) 



^v^^W^HBilh 



2i-l/«A2 



(»)* 



r(f)2f-i 



(2e)-^v / 27^T 2 2 2-i)^ - 7 - T - 1 . (36) 



;i + (^) 2 
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Using the facts that ft<l and V{i) ~ ( ^ ) 2 when d is large, (l36l > can be rewritten as 
Since 6£> <$; 1 it further follows 



Ee V2d=l&V2\\B^\\ 2 



g a£l((«0*+») 

<f((<5fo) 2 +,5b) 



(37) 



To compute EeWl fl H 2 we first note that Fi is a d dimensional vector. Let F\ = [Fi ir F\ 2 , . . . ,F\ d \. As we have 
stated earlier F\ , 1 ^ p ^ d are i.i.d. zero-mean Gaussian random variables with variance var(Fi 



2^2 



<re 



/ 2 , 1 ^ p ^ d. Then the rest of the derivation for computing Ee^ 2d 1<5 v 2 ll F ill2 follows directly as in the case of 
E gv'2d-i5v^||Bi|| 2 _ Hence we can write similarly to (l37l) 



^gV^d^Tfiv^HFiHz 



p d((5fr+5f) 



(38) 



B. Computing E 



-v'2d=T<5\/2||Gi||2 



Now we turn to computing £e ^ 2rf 1(5 v2||G 1 || 2i since g 1 [ s a d dimensional vector let G\ = [G\ lt G\ v . . . , GiJ. 
As we have stated earlier Gi , 1 ^ p ^ d are i.i.d. zero-mean Gaussian random variables with variance 
var(Gi ,) = (yl — e 2 — ce) 2 = g 2 , 1 ^ p ^ d. Then we can write 



£ e -V2d=T$V2||Gi|| 2 



2tt 



exp -\/2d- \6\flg 



\ 



d T d i G? 



Similarly as in the previous subsection using the spherical coordinates it is not that difficult to show that the 
previous integral can be transformed to 



Ee -V2d^l5V2\\G 1 \ 



2^ 



'In 



4 r(; 



r d-l e -V2d-lsV2gr-^^ r 



_,,, (2rf-l)(^) 2 (2rf-l)fe) 2 

r(fl)e 2 e 2 

r(|)24-i r(d) 

(2ri-l)fe) 2 

r(a)e 2 



/■°° r d-l e -v , 2d-l<5V2gr-4^ r 
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r(|)2i 



— U(^2-^,V27^W2g) 



(39) 



where as earlier 17 is parabolic cylinder function. Before proceeding further we again recall another set of the 
asymptotic results for U from [32]. Namely, from [32] we have that if C 3> and t ^ 



C 2 /- hide-?? 
2 (t 2 + l)i 



(40) 
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where 

h(C) = 2^-h^r^~K P = \{ty/l + t 1 + ln(f + Vl + t 2 )). (41) 

^From d40l ) and (l4TT > we have 



Connecting d39l ) and (1421 we finally obtain for d 3> and <5 <C 1 (as earlier 5 is a constant independent of d) 

r(4)2«- 1 V / 



.i \ e 



^r 1 (l(^V /T +(^) 5 + ln (^+V /T +(W))) 



r(|)2§- 1 V 7 (i + (W 

Using the facts that T(|) m (^)s and F(d) sa (|) rf when d is large, (03]) can be rewritten as 
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Since 6^ <^C 1 it further follows 
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